The goal is look at some insights about the factors that affect the overall rating of the players.
This database contains +10,000 players' data and their attirbutes. It covers seasons from 2008 to 2016 in 11 european countries including each team of the lead leauge and the teams' attributes. The whole database are provided through Kaggle datasets as a stort of a databse file containing different linked tables.
This plot shows a slighty right-skewed normal curve for the age distribution with a mode of 28 years.
This plot shows a normal curve, as expected, for the wieght distribution with a mode of about 170 lbs.
The attributes show steady distribution of left-skewed normal distribution.
These plots indicates a normal distribution for the ordinal variables. And it shows that most players user their right foot as the preferred one.
This plot shows a positive relation between the age and the weight.
This plot shows clearly the non-linear positive relation between the age and the overall rating where it gets its peak at the age of 30 where its very close to the mode of the age distribution!
From this plot, it can be seen that ther overall rating is affected by many two skills, the potential and the reactions. And the ball control is related to both the short passing and drippling (it's logic). And all goal keeping skills are negatively correlated with other skills. A negative relation is observed between the acceleration and the strength which focuses on the tradeoff between both.
Perfect!, the relation is strong positive relation between the two skills and the overall_rating
Aha, it seems that the relation can be divided into two categories, one for the dark red area, and the other for light red area. This needs more investigation to discover the attributes related to those two categories. Which might be discoverd in the multivariate section.
This is the normal distribution which indicates the odd relation in the previous plot of the potential skill.
It seems that most players has a medium attacking and defensive work rate then followed by medium defensive rate with high attacking rate. Surprisingly, there are lots of players that have high attacking and defensive work rates.
All plots have the same normal effect on the overall rating
It does not seem that the attacking work rate affects this.
Again, it has no effect and still the upper edge still exist in all plots. This may lead to a conclusion that there is too much data where the potential value is entered equals the overall rating as default value.
It's 24% ! This is a very large number to have the potentail be exactly the same value as the overall rating which supports that it might by entered by default as the same value as the overall rating. Let's see this value for another attribute.
Interseting thing that in most plots the relation is linear except for the medium-medium group which corresponds to a huge part of the data as indicated from the bivariate charts. So, it's more preferable to split the data as above and don't make a general conclusion about the effect of the age on the overall rating of the player.
This chart shows no correlation between the two work_rates and the overall rating as it is almost consistent at a score of 70. Another way to prove there is no relation, as the low-low group (the most right column) shows a score of 70 for the average overall rating of the players which should be smaller if there was a relation. But from the previous chart, the work rate has an effect on the relation between the age and the overall rating as it gets more linear when it gets away from the medium work rates.
As a final note, the overall rating of the player is affected by some factors including the age and the potentail and reactions skills which should be the focus for further inferetial studies to tell if these factors are significant or not.